10 research outputs found
Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research
Research into multi-modal perception, human cognition, behavior, and
attention can benefit from high-fidelity content that may recreate
real-life-like scenes when rendered on head-mounted displays. Moreover, aspects
of audiovisual perception, cognitive processes, and behavior may complement
questionnaire-based Quality of Experience (QoE) evaluation of interactive
virtual environments. Currently, there is a lack of high-quality open-source
audiovisual databases that can be used to evaluate such aspects or systems
capable of reproducing high-quality content. With this paper, we provide a
publicly available audiovisual database consisting of twelve scenes capturing
real-life nature and urban environments with a video resolution of 7680x3840 at
60 frames-per-second and with 4th-order Ambisonics audio. These 360 video
sequences, with an average duration of 60 seconds, represent real-life settings
for systematically evaluating various dimensions of uni-/multi-modal
perception, cognition, behavior, and QoE. The paper provides details of the
scene requirements, recording approach, and scene descriptions. The database
provides high-quality reference material with a balanced focus on auditory and
visual sensory information. The database will be continuously updated with
additional scenes and further metadata such as human ratings and saliency
information.Comment: 6 pages, 2 figures, accepted and presented at the 2022 14th
International Conference on Quality of Multimedia Experience (QoMEX).
Database is publicly accessible at https://qoevave.github.io/database
Towards the Perception of Sound Source Directivity Inside Six-Degrees-of-Freedom Virtual Reality
Sound source directivity is a measure of the distribution of sound, propagating from a source object. It is an essential component of how we perceive acoustic environments, interactions and events. For six-degrees-of-freedom (6-DoF) virtual reality (VR), the combination of binaural audio and complete freedom of movement introduces new influencing elements into how we perceive source directivity. This preliminary study aims to explore if factors attributed to 6- DoF VR have an impact on the way we perceive changes of simple sound source directivity. The study is divided into two parts. Part I comprises of a control experiment in a non-VR monaural listening environment. The task is to ascertain difference limen between reference and test signals using a method of adjustment test. Based on the findings in Part I, Part II implements maximum attenuation thresholds on the same sound source directivity patterns using the same stimuli in 6-DoF VR. Results indicate that for critical steady-state signals, factors introduced by 6-DoF VR potentially mask our ability to detect loudness differences. Further analysis of the behavioral data acquired during Part II provides more insight into how subjects assess sound source directivity in 6-DoF VR
Enhanced Immersion for Binaural Audio Reproduction of Ambisonics in Six-Degrees-of-Freedom: The Effect of Added Distance Information
The immersion of the user is of key interest in the reproduction of acoustic scenes in virtual reality. It is enhanced when movement is possible in six degrees-of-freedom, i.e., three rotational plus three translational degrees. Further enhancement of immersion can be achieved when the user is not only able to move between distant sound sources, but can also move towards and behind close sources. In this paper, we employ a reproduction method for Ambisonics recordings from a single position that uses meta information on the distance of the sound sources in the recorded acoustic scene. A subjective study investigates the benefit of said distance information. Different spatial audio reproduction methods are compared with a multi-stimulus test. Two synthetic scenes are contrasted, one with close sources the user can walk around, and one with far away sources that can not be reached. We found that for close or distant sources, loudness changing with the distance enhances the experience. In case of close sources, the use of correct distance information was found to be important
Listening Tests with Individual versus Generic Head-Related Transfer Functions in Six-Degrees-of-Freedom Virtual Reality
Individual head-related transfer functions (HRTFs) improve localization accuracy and externalization in binaural audio reproduction compared to generic HRTFs. Listening tests are often conducted using generic HRTFs due to the difficulty of obtaining individual HRTFs for all participants. This study explores the ramifications of the choice of HRTFs for critical listening in a six-degrees-of-freedom audio-visual virtual environment, when participants are presented with an overall audio quality evaluation task. The study consists of two sessions using either individual or generic HRTFs. A small effect between the sessions is observed in a condition where elevation cues are impaired. Other conditions are rated similarly between individual and generic HRTFs
Quality of experience in telemeetings and videoconferencing: a comprehensive survey
Telemeetings such as audiovisual conferences or virtual meetings play an increasingly important role in our professional and private lives. For that reason, system developers and service providers will strive for an optimal experience for the user, while at the same time optimizing technical and financial resources. This leads to the discipline of Quality of Experience (QoE), an active field originating from the telecommunication and multimedia engineering domains, that strives for understanding, measuring, and designing the quality experience with multimedia technology. This paper provides the reader with an entry point to the large and still growing field of QoE of telemeetings, by taking a holistic perspective, considering both technical and non-technical aspects, and by focusing on current and near-future services. Addressing both researchers and practitioners, the paper first provides a comprehensive survey of factors and processes that contribute to the QoE of telemeetings, followed by an overview of relevant state-of-the-art methods for QoE assessment. To embed this knowledge into recent technology developments, the paper continues with an overview of current trends, focusing on the field of eXtended Reality (XR) applications for communication purposes. Given the complexity of telemeeting QoE and the current trends, new challenges for a QoE assessment of telemeetings are identified. To overcome these challenges, the paper presents a novel Profile Template for characterizing telemeetings from the holistic perspective endorsed in this paper
No dynamic visual capture for self-translation minimum audible angle
Auditory localization is affected by visual cues. The study at hand focuses on a scenario where dynamic sound localization cues are induced by lateral listener self-translation in relation to a stationary sound source with matching or mismatching dynamic visual cues. The audio-only self-translation minimum audible angle (ST-MAA) is previously shown to be 3.3° in the horizontal plane in front of the listener. The present study found that the addition of visual cues has no significant effect on the ST-MAA.Peer reviewe
Perceptual Study of Near-Field Binaural Audio Rendering in Six-Degrees-of-Freedom Virtual Reality
Auditory localization cues in the near-field (<1.0 m) are significantly different than in the far-field. The near-field region is within an arm’s length of the listener allowing to integrate proprioceptive cues to determine the location of an object in space. This perceptual study compares three non-individualized methods to apply head-related transfer functions (HRTFs) in six-degrees-of-freedom near-field audio rendering, namely, far-field measured HRTFs, multi-distance measured HRTFs, and spherical-model-based HRTFs with near-field extrapolation. To set our findings in context, we provide a real-world hand-held audio source for comparison along with a distance-invariant condition. Two modes of interaction are compared in an audio-visual virtual reality: one allowing the participant to move the audio object dynamically and the other with a stationary audio object but a freely moving listener.Peer reviewe